Matrix Decomposition Techniques for Data Privacy

نویسندگان

  • Jun Zhang
  • Jie Wang
  • Shuting Xu
چکیده

Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing profitability to enhancing national security. The widespread applications of data mining technologies have raised concerns about trade secrecy of corporations and privacy of innocent people contained in the datasets collected and used for the data mining purpose. It is necessary that data mining technologies designed for knowledge discovery across corporations and for security purpose towards general population have sufficient privacy awareness to protect the corporate trade secrecy and individual private information. Unfortunately, most standard data mining algorithms are not very efficient in terms of privacy protection, as they were originally developed mainly for commercial applications, in which different organizations collect and own their private databases, and mine their private databases for specific commercial purposes. In the cases of inter-corporation and security data mining applications, data mining algorithms may be applied to datasets containing sensitive or private information. Data warehouse owners and government agencies may potentially have access to many databases collected from different sources and may extract any information from these databases. This potentially unlimited access to data and information raises the fear of possible abuse and promotes the call for privacy protection and due process of law. Privacy-preserving data mining techniques have been developed to address these concerns (Fung et al., 2007; Zhang, & Zhang, 2007). The general goal of the privacy-preserving data mining techniques is defined as to hide sensitive individual data values from the outside world or from unauthorized persons, and simultaneously preserve the underlying data patterns and semantics so that a valid and efficient decision model based on the distorted data can be constructed. In the best scenarios, this new decision model should be equivalent to or even better than the model using the original data from the viewpoint of decision accuracy. There are currently at least two broad classes of approaches to achieving this goal. The first class of approaches attempts to distort the original data values so that the data miners (analysts) have no means (or greatly reduced ability) to derive the original values of the data. The second is to modify the data mining algorithms so that they allow data mining operations on distributed datasets without knowing the exact values of the data or without direct accessing the original datasets. This article only discusses the first class of approaches. Interested readers may consult (Clifton et al., 2003) and the references therein for discussions on distributed data mining approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative Matrix Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets

1 Abstract— Powerful modern access to huge amounts of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix decomposition techniques to protect pr...

متن کامل

Modified Privacy Preserving Data Mining System for Improved Performance

Privacy of information and security issues now-a-days has become the requisite because of big data. A novel framework for extracting and deriving information when the data is distributed amongst the multiple parties is presented by Privacy Preserving Data Mining (PPDM). The concern of PPDM system is to protect the disclosure of information and its misuse. Major issue with PPDM that exists is to...

متن کامل

Feature Selection: A Preprocess for Data Perturbation

As a major concern in designing various data mining applications, privacy preservation has become a critical component seeking a trade-off between mining performances and protecting sensitive information. Data perturbation or distortion is a widely used approach for privacy protection. Many privacy preservation approaches were developed, either by adding noises or by matrix decomposition method...

متن کامل

Privacy Preserving Clustering on Distorted data

In designing various security and privacy related data mining applications, privacy preserving has become a major concern. Protecting sensitive or confidential information in data mining is an important long term goal. An increased data disclosure risks may encounter when it is released. Various data distortion techniques are widely used to protect sensitive data; these approaches protect data ...

متن کامل

NNMF-Based Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets

1 Abstract— Powerful modern access to a huge amount of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix techniques to protect privacy and pre...

متن کامل

A Comparative Study on Data Perturbation with Feature Selection

As a major concern in designing various data mining applications, privacy preservation has become a critical component seeking a trade-off between mining utilities and protecting sensitive information. Data perturbation or distortion is a widely used approach for privacy protection. Either by adding noises or matrix decomposition methods, many algorithms were developed based on the simulation o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009